Using articulatory measurements to learn better acoustic features
نویسندگان
چکیده
We summarize recent work on learning improved acoustic features, using articulatory measurements that are available for training but not at test time. The goal is to improve recognition using articulatory information, but without explicitly solving the difficult acoustics-to-articulation inversion problem. We formulate the problem as learning a (linear or nonlinear) transformation of standard acoustic features, such that the transformed vectors are maximally correlated with some (linear or nonlinear) transformation of articulatory measurements. This formulation leads to the standard statistical technique of canonical correlation analysis (CCA) and its nonlinear extension kernel CCA. Along the way, we have developed a scalable variant of kernel CCA and a new type of nonlinear CCA via deep neural networks (deep CCA). The learned features can improve phonetic classification and recognition and generalize across speakers, and deep CCA shows promise over kernel CCA.
منابع مشابه
Kernel CCA for multi-view learning of acoustic features using articulatory measurements
We consider the problem of learning transformations of acoustic feature vectors for phonetic frame classification, in a multi-view setting where articulatory measurements are available at training time but not at test time. Canonical correlation analysis (CCA) has previously been used to learn linear transformations of the acoustic features that are maximally correlated with articulatory measur...
متن کاملMulti-view Acoustic Feature Learning Using Articulatory Measurements
We consider the problem of learning a linear transformation of acoustic feature vectors for phonetic frame classification, in a setting where articulatory measurements are available at training time. We use the acoustic and articulatory data together in a multi-view learning approach, in particular using canonical correlation analysis to learn linear transformations of the acoustic features tha...
متن کاملMultiview Representation Learning via Deep CCA for Silent Speech Recognition
Silent speech recognition (SSR) converts non-audio information such as articulatory (tongue and lip) movements to text. Articulatory movements generally have less information than acoustic features for speech recognition, and therefore, the performance of SSR may be limited. Multiview representation learning, which can learn better representations by analyzing multiple information sources simul...
متن کاملCombining acoustic and articulatory feature information for robust speech recognition
The idea of using articulatory representations for automatic speech recognition (ASR) continues to attract much attention in the speech community. Representations which are grouped under the label ‘‘articulatory’’ include articulatory parameters derived by means of acoustic-articulatory transformations (inverse filtering), direct physical measurements or classification scores for pseudo-articul...
متن کاملArticulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks
This paper presents an articulatory-to-acoustic conversion method using electromagnetic midsagittal articulography (EMA) measurements as input features. Neural networks, including feed-forward deep neural networks (DNNs) and recurrent neural networks (RNNs) with long short-term term memory (LSTM) cells, are adopted to map EMA features towards not only spectral features (i.e. mel-cepstra) but al...
متن کامل